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We present a comprehensive and versatile theoretical framework to study site and bond percola¬ 
tion on clustered and correlated random graphs. Our contribution can be summarized in three main 
points, (i) We introduce a set of iterative equations that solve the exact distribution of the size and 
composition of components in finite size quenched or random multitype graphs, (ii) We define a very 
general random graph ensemble that encompasses most of the models published to this day, and also 
that permits to model structural properties not yet included in a theoretical framework. Site and 
bond percolation on this ensemble is solved exactly in the infinite size limit using probability gener¬ 
ating functions [i.e., the percolation threshold, the size and the composition of the giant (extensive) 
and small components]. Several examples and applications are also provided, (iii) Our approach 
can be adapted to model interdependent graphs—whose most striking feature is the emergence of 
an extensive component via a discontinuous phase transition—in an equally general fashion. We 
show how a graph can successively undergo a continuous then a discontinuous phase transition, 
and preliminary results suggest that clustering increases the amplitude of the discontinuity at the 
transition. 

PACS numbers: 64.60.aq,64.60.ah,64.60.an,02.10.0x 

I. INTRODUCTION 

Percolation on graphs offers a simple theoretical frame¬ 
work to model and investigate the behavior of many 
complex systems; noteworthy examples being the growth 
and the robustness of their structure mm, their ob¬ 
servability M, as well as the effect of their structure 
on the propagation of emerging infectious agents mm- 
On the analytical front, recent progress has been mainly 
achieved within the Configuration Model (CM) paradigm 
|7j, which, in the limit of large graphs, allows an exact 
and simple analytical treatment with the use of probabil¬ 
ity generating functions (pgf) [5} [5]. The versatility of the 
pgf method has triggered the development of many vari¬ 
ants of the CM reproducing, to some extent, correlations 
and clustering found in real complex systems [1014311 . 

To move beyond what has been done thus far, we intro¬ 
duce a very general and comprehensive class of random 
graphs that increases significantly the nontrivial corre¬ 
lations and clustering patterns that can be handled an¬ 
alytically. Correlations and clustering are incorporated 
into the graphs through the use of types of vertices and 
types of stubs (i.e., half-edge stemming from vertices). 

Hence, by explicitly controlling who is connected to whom 
and through what kind of connection , our approach re¬ 
produces any correlations as long as they can be mapped 
unto this multitype framework. For instance, the type of 
the vertices can correspond to their degree (the number 
of neighbors) [32] ED] , to their intrinsic properties such as 
age or ethnicity ini eh, or to their position in the k-core 
structure of the graph [16]. 


* Now at Departament de Fi'sica Fonamental, Universitat de 
Barcelona, Carrer de Marti' i Franques 1, 08028 Barcelona, Spain 
t Now at Santa Fe Institute, Santa Fe, New Mexico 87501, USA 


Furthermore the use of types of stubs explicitly ac¬ 
counts for different categories of connections. On the 
one hand, these differences may be of a conceptual na¬ 
ture [32j . For instance in multilayer or multiplex graphs 
the type of an edge refers to the layer of interaction to 
which it belongs (e.g., family ties and acquaintances in 
social networks). On the other hand, the different types 
of stubs can describe different topological functions. Since 
some edges may be undirected or directed, different types 
of stubs can be used to identify in-degrees , out-degrees or 
undirected degrees [9[ [20]. More importantly perhaps, 
stubs can be matched in groups of more than two ver¬ 
tices to form motifs , also called hyperedges BUIE], per¬ 
mitting the inclusion of clustering in a very general and 
natural fashion. These motifs can take a wide variety 
of forms: simple triangles, cliques of several hundreds of 
vertices, or arbitrary graphs with directed and multiple 
edges [see Fig. m- Additionally, these motifs can have 
a quenched (i.e., fixed) or a random structure (e.g., mul¬ 
titype Erdos-Renyi graphs). 

We have developed a mathematical framework that 
solves the site and bond percolation (hereafter hybrid 
percolation) on this general class of random graphs. We 
build upon the well-known pgf-based formalism and ob¬ 
tain the analytical expression for the size of the extensive 
“giant” component, the percolation threshold, as well as 
the distribution of the size of the “small” components 
in the limit of large graph size. However, the pgf ap¬ 
proach de facto assumes locally tree-like graphs forbid¬ 
ding closed loops and therefore any clustering whatso¬ 
ever. To circumvent this limitation, we present a set of 
iterative equations that exactly solves the size distribu¬ 
tion of components in finite-size arbitrary, or quenched, 
graphs. These equations map the possible outcomes of 
hybrid percolation on any motifs (i.e., the size distribu¬ 
tion of the components) unto a distribution of branching 
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trees, and thereby reconcile the presence of motifs with 
the tree-like requirement of the pgf approach. 

The general nature of our model acts as a theoretical 
laboratory where the effect of a wide selection of struc¬ 
tural features on the outcomes of hybrid percolation can 
be investigated on a common ground. To facilitate under¬ 
standing and to provide support for our claims, several 
examples accompany the analysis and illustrate its prac¬ 
tical implementation. Moreover, our model encompasses 
most variants of the CM published to date, we provide 
several examples supporting this claim as well. 

Finally, we show how our approach can be adapted 
to model interdependent graphs—in which the exten¬ 
sive component emerges via a discontinuous transition 
instead of a continuous one [381184] — through a suitable 
change in the definition of what constitutes an extensive 
component (i.e., the order parameter). This adaptation 
show that a graph can successively undergo a continuous 
then a discontinuous phase transition |)35] , and provide 
a quantitative measure of the effect of clustering on the 
emergence of the extensive component. 

The paper is organized as follows. In Sec. [TIJ we in¬ 
troduce the set of iterative equations that exactly solves 
the size distribution of components in finite-size arbitrary 
graphs. In Sec. m we formally define the general graph 
ensemble discussed above and obtain its exact structural 
properties under hybrid percolation (i.e., the size and 
composition of the components and the position of the 
percolation threshold). We then illustrate the workings 
of our formalism with several examples and special cases 
in Sec. IV We finally show how our approach can be 
adapted to model interdependent graphs in Sec . |V| Con¬ 
clusions and final remarks are collected in Sec. IVI1 


individually and independently with a probability p mi- 
We generalize this model by labeling vertices using types; 
the set of types is noted Af, and there are a total of \AF\ 
types of vertices. A directed edge from a vertex of type 
i towards a vertex of type j (noted i —> j) exists with 
a probability p t j independently of other potential edges 
|38| . For the sake of conciseness, we will refer to a graph 
composed of n* vertices of type i (with i = 1,..., |A/"|) 
with the vector n = (ni,..., n|j\/|) T . We will use a sim¬ 
ilar notation for other quantities throughout this paper, 
unless specified otherwise. 

Since edges may be directed, we define a component as 
the vertices that are reachable from a given initial ver¬ 
tex, including itself (i.e., the out-component rooted to 
this given vertex). This initial vertex is identified solely 
by its type since vertices of a given type are indistin¬ 
guishable. We define Wi(l\n) as the probability that 
l = (?i,..., ) T vertices can be reached from an ini¬ 
tial vertex of type j in a graph containing n vertices. 
The calculation of Wi{l\n) begins with the initial condi¬ 
tion Wi(Si\5i) = 1, where Si is the vector of Kronecker 
deltas (<5ii,..., <5 ^|a/"| ) T and corresponds to a single vertex 
of type i. This initial condition simply states that the 
probability of finding a component of one vertex of type 
i in a graph containing one vertex of type i is 1. Now 
suppose that there are other vertices in the graph and 
that it contains n vertices instead. The probability of 
finding a component of only one vertex of type i in this 
graph, Wi(8i\ri), is equal to the probability that there is 
a component containing Si vertex, Wi(Si\Si), (which in 
this case is equal to one) multiplied with the probability 
that none of the potential edges from the vertex in the 
component (i.e., the vertex of type i) towards the other 
vertices of the graph of size n exist 


II. PERCOLATION ON FINITE-SIZE 
ARBITRARY GRAPHS 

To reconcile the tree-like assumption of the pgf ap¬ 
proach with the presence of motifs in graphs, the out¬ 
comes of percolation on these motifs—the distribution 
of the number of vertices that can be reached from a 
given vertex—must be obtained beforehand. These dis¬ 
tributions can be computed by hand by enumerating each 
possible configuration where vertices and edges exist with 
given probabilities H7HH]- However, this procedure be¬ 
comes rapidly unwieldly for motifs of more than a hand¬ 
ful of vertices. Instead, we generalize the equations pre¬ 
sented in Ref. [36j to obtain a set of iterative equations 
that solve the outcome of hybrid percolation on small 
arbitrary graphs. 


WdSi\n) = WiiSilSi) lid- Pik) nh ~ Sik • (1) 

fceY 

Let us now consider a component made of 2 vertices of 
type i, that we note 2Si. By definition of the multitype 
random graphs, we know that Wj{2Si\2Si) = pa since 
the component exists only if there is a directed edge from 
the initial vertex to the other vertex of type i. Following 
the steps leading to Eq. |l]), the probability of finding a 
component of size 2Si from an initial vertex of type i in 
a graph containing n vertices (we assume that n* > 2) is 

Wi(2Si\n) = Wi(2Si\28i)(rii - 1) JJ (l - p lk )^ nk ~ 2S ' k) , 

fceY 

( 2 ) 


A. Multitype Erdos-Renyi graphs 

Let us first consider multitype random graphs as a gen¬ 
eralization of the Qn iP model (i.e., Erdos-Renyi random 
graphs) in which n vertices are linked by edges that exist 


where the extra factor (rij — 1) accounts for the number of 
ways to choose the second vertex among the n, — 1 avail¬ 
able vertices of type i in the graph, and 2(rik — 2Sik ) is the 
number of potential edges from the two vertices of type i 
towards the rik — 2 Sik vertices of type k. Repeating this 
exercise for larger components, we obtain the following 
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FIG. 1. (Color online) (a) Example of an arbitrary graph that can be handled by our framework. Blue and red represent 
vertex types 1 and 2. There are 10 vertices of each type. m Distribution of the number of vertices of type j in components 
reached from an initial vertex chosen at random for the graph shown in (a) Symbols are the results of 2.5 x 10 s numerical 
simulations, and lines are the predictions of Eqs. ( |3a| )-( [3d| ). The distributions are discrete; lines have been added to guide the 
eye. Triangles (A) correspond to pure site percolation with {f}} = {r},^} = {0.40,0.70} and {p[j} = {Pii,Pi 2 ,P 2 hP 22 } = 
{1.00,1.00,1.00,1.00}. Circles (o) correspond to hybrid percolation (site and bond) with {f}} = {r},?^} = {0.95,0.90} and 
{p'ij} = {pi i, P 12 5 P 21 5 P 22 } = {1-00, 0.90, 0.85, 0.95}. These probabilities are given in terms of the original vertex types to lighten 
the presentation (thus the use of a prime). To use the mapping described in Sec. II B[ a probability for each individual vertex 
and each individual edge must be defined. For instance, if vertex 8 is of type 1, we set rg = f[. Similarly, if vertex 5 is of type 
2 and shares three undirected edges with vertex 8 , we set p$s = 1 — (1 — P 21) 3 and Pss = 1 — (1 — p' 12 ) 3 - 


general form for a generic component of size l 

Wi{i\n)=wMi) n (''' t') n (i -p ^ {nk ~ ik) , 

jeAf ' J ' fceAf 

(3a) 

where is the probability that l vertices form a 

component considering an initial vertex of type i. In this 
last equation, the binomial coefficients count the number 
of ways the other l — <5j vertices in the component can be 
chosen from the n — Si vertices available in the graph, 
and the other terms correspond to the probability that 
no other vertices can be reached from the l vertices in 
the component. 

The only missing information in Eq. ( |3a| ) is the proba¬ 
bility Wi(l\l). As seen in the two simple examples above, 
it is possible to compute the probability Wi{l\l) by hand, 
but this calculation becomes rapidly tedious as the size 
of the component increases. Fortunately, we can use 
Eq. (3a) to circumvent this difficulty. For example, from 
Wi(d~ldi) = 1, Eq. |T]) yields Wi(Si\2di) = 1 - Va- Since 
the probabilities must sum to 1 for a given graph size, we 
conclude that Wi(2Si\2Si) = 1 — W,(<5j|2di) = pa- Hence 
it is possible to build upon the probabilities computed 
for smaller graph size to obtain the missing probability 
Wi(l\l) by simply asking for normalization. In general 
terms, 

Wi(l\l) = 1 - ^ Wi(m\l) , (3b) 

m<l 

where the probabilities Wi(m\l) are obtained with 


Eq. (3a), and where the sum covers every possible in¬ 


stances of m such that rrij < lj for all j but excludes the 
case in which all elements of the two vectors are equal 
(i.e., m.j = lj for every j). In short, Eqs. (3a)- (3b) are 
mutually dependent: the left-hand side of one feeds the 
right-hand side of the other. Thus, from a graph consist¬ 
ing of a single vertex (the initial condition), Eqs. (3a) 


(3b) extend the graph to the desired size n, and keep 


track of the component size distribution along the way 
to build the final distribution {Wi(l\n)}. 

A mass of information is produced during the iteration 
of Eqs. (3a)- (3b): the probability of finding every possi¬ 


ble components l in each intermediate graph whose size 
is smaller than n. When interested in bond percolation 
solely (as in Ref. jM|), the only probabilities of interest 
are the ones related to the graph of maximum size n. 
This ultimately leaves most of the calculated probabili¬ 
ties unused. However, if interested in hybrid percolation, 
that is when edges and vertices exist with given prob¬ 
abilities, all the calculated probabilities can be put to 
contribution. 

The probability for a graph of original size n to be left 
with b vertices after each of its vertices has been inde¬ 
pendently kept with probabilities {rj}j g w (i.e., a vertex 
of type j is kept with probability rj ) is 


Bi{b\n) = 

je N 


. (3c) 


where we assume that the initial vertex of type i exists. 
Hence, from a starting vertex of type *, the probability 
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to find a component of size l in a graph of original size 
n when vertices and edges exist with given probabilities, 
Qi(l\n), is 

n 

Qi{l\n) =Y J W i {l\b)B i {b\n) , (3d) 

b—l 

where the sum covers every possible instances of b such 
that lj < bj < rij for every j £ Af. Thus, by slightly 
increasing the computational effort, it is possible to in¬ 
corporate site percolation into the systematic method in¬ 
troduced in Ref. [36] for bond percolation. 


B. Arbitrary graphs 


Our framework can also be used to predict the out¬ 
comes of hybrid percolation on small arbitrary graphs. 
By small arbitrary graphs, we mean graphs of finite size 
with a fixed structure, in which edges may be directed 
and/or multiple, and whose vertices may belong to types. 
We use the adjacency matrix A, whose element A t j is 
the number of edges leaving vertex i towards vertex j, 
to specify the structure. Figure [ipOl depicts an example 
of such an arbitrary graph. In such graphs, percolation 
corresponds to the random removal of edges and vertices 
according to some given probabilities which may depend 
on the type of the vertices involved. Predicting the out¬ 
come of percolation then consists in predicting the prob¬ 
ability that a component of size l can be reached from a 
given initial vertex in a graph of size n. 

For Eqs. (3aI (3d) to be applicable, we need to map 


arbitrary graphs unto multitype random graphs. This 
mapping is achieved by assigning to each vertex its own 
type (|A/ - 1 equals the number of vertices), and by set¬ 
ting the probabilities {pij} to mimic the structure of the 
original arbitrary graph. To account for the fact that 
more than one edge may exist between two vertices in 
the original graph, we set = 1 — (1 — Pij) Aij , where 
pij is the probability that an individual edge from vertex 
i to vertex j remains after the random removal of edges 
in the arbitrary graph. Note that p^ may depend on the 
original types of vertices i and j in th e graph [e.g., there 
are two types of vertices in Fig. 0EH The same applies 
for the existence probabilities of vertices (i.e., {r^} must 
be equal to {r,}). An example is given in the caption 
of Fig. [I] Using this mapping, Eqs. ( [3a] ) ( fiid] ) offer a 
systematic procedure to compute the outcomes of hybrid 
percolation on small arbitrary graphs. Figure [fjjb)] com¬ 
pares the predictions of Eqs. pa|)-(|3d|) with the results of 


numerical simulations for the arbitrary graph shown in 
Fig. f 'a) As expected, a perfect agreement is observed. 


III. PERCOLATION ON CORRELATED AND 
CLUSTERED INFINITE RANDOM GRAPHS 


We now turn our attention to the generalized version of 
the CM briefly described in the Introduction. We provide 


a formal definition of the model, and analytically solve 
percolation for this general ensemble of random graphs. 


A. A stub matching scheme 

The CM defines an ensemble of graphs that are random 
in all respects except for the degree of their vertices (the 
number of neighbors) which is prescribed by a given dis¬ 
tribution {P(k)}ken- More precisely, to generate graphs 
of this ensemble we start with N vertices and assign a de¬ 
gree to each one by drawing an integer from {-P(fc)}fceN- 
We then build a list of stubs (half-edges) in which a ver¬ 
tex whose degree is k appears k times. We shuffle the list 
and pair stubs according to this randomized list to create 
edges. Up to corrections of order 0(N~ 1 ), this procedure 
uniformly samples the ensemble of graphs with a given 
degree distribution |7]. Moreover, as closed loops also 
occur with a probability proportional to IV , this pro¬ 
cedure generates graphs that are locally tree-like in the 
limit N —> oo. 

We generalize this scheme to account for types of ver¬ 
tices and types of stubs. In our model, each of the N 
vertices belongs to a type and we note Af the set of 
vertex types, as in the last section. We also note Wi 
the fraction of vertices whose type is i. As in the CM, 
vertices are assigned a number of stubs, but now these 
stubs are identified with types as well. We say that a 
vertex has k a stubs of type a, and we note £ the set 
of stub types. Unless specified otherwise, Greek and 
Latin letters refer to types of edges and vertices, respec¬ 
tively. The number of stubs of each type belonging to a 
vertex of type i is prescribed by the joint degree dis¬ 
tribution {P i (fcl,...,/C|£|)}fe 1) ..., fe|£|eN = {Pi(fe)} fc£N |£|. 
Hence, when generating graphs from this ensemble, each 
of the N vertices is assigned a type according to {wiji^jy 
and then assigned a number of stubs of each type accord¬ 
ing to the corresponding joint degree distribution. 

To generate graphs from this sequence of vertices, we 
build a list of stubs for each pair (a, i) where a £ £ and 
i £ Af. For example, a vertex of type i that has k a stubs 
of type a and kp stubs of type /3 appears k a times in the 
list (a,i) and kp times in the list (/?,*). Stubs are then 
randomly matched according to a set of rules—noted 7Z- 
to generate graphs. The information encoded in these 
rules is twofold. On the one hand, they prescribe from 
which lists should stubs be picked during the matching 
step. Mathematically, this is encoded in the distribution 
{R(n)} neNl £ l*l A ''l where n is a matrix whose elements, 
n a i (for every a £ £ and i £ Af), give the number of 
stubs from each list involved in the edge (or hyperedge , if 
more than two stubs are involved). The probability that 
an hyperedge contains n stubs is then R(n). 

On the other hand, the rules 1Z prescribe how the ver¬ 
tices are connected to one another within the hyperedge. 
For example, stubs from the list (a,i) and (a,j) could 
be paired to create undirected edges between layers i 
and j of multilayer graphs. Similarly, stubs from the 
















5 



FIG. 2. (Color online) Illustration of the stub matching scheme, (a) vertices are attributed a type according to the distribution 
{w,}, and are given a number of stubs of each type according to the distribution (Pi(fc)}. There are IV = 12 vertices, \Af\ = 3 
types of vertices (4 vertices of type 1 in red, 4 vertices of type 2 in green, and 4 vertices of type 3 in blue), and \£\ = 2 types 
of stubs (light blue and dark red). Stubs are then randomly matched according to a set of rules, 1Z, to create hyperedges. For 
example, a light blue stub and a dark red stub that both stem from vertices of type 2 can be matched to create a directed edge, 
or three light blue stubs stemming from three vertices of type 1, 2 and 3 can be matched to create a triangle. More complex 
hyperedges are possible and can be handled by our mathematical approach. |(b)| Example of a graph obtained with the stub 
matching scheme which can reproduce a great variety of nontrivial correlations and clustering patterns. In the infinite size limit 
(N —> oo), the resulting graphs have an underlying tree-like structure: there are no closed path other than within clustered 
liyperedges. 


lists (f3,i) and ( 7 ,*) could be paired to create directed 
edges between vertices of a same type (the two types 
of stubs corresponding respectively to the m-degree and 
owl-degree). Moreover, three stubs from a same list could 
be matched to create triangles, or m stubs of type e stem¬ 
ming from different types of vertices could be matched 
to form a multitype Erdos-Renyi motif where edges exist 
with probability p (see Sec. II A). In fact, the hyperedges 


can take any imaginable form and composition as long as 
they can be mapped unto the multitype random graphs 
defined in Sec. [Tl} Note that only one stub is required 
to be part of an hyperedge, even if this hyperedge con¬ 
tributes to more than one to the degree of vertices. For 
instance, if stubs of type A correspond to triangles, a 
vertex with = 2 will belong to two triangles. An il¬ 
lustration of the stub matching scheme is given in Fig. [5] 
For this graph ensemble to be consistent, the distri¬ 
butions {Pi(fe)} fc6N |£ and {I?(n)} Tl6N |£|x|.A/'i must obey 
certain constraints in the limit N —> 00 . Namely 


Wj(k a ) Pi _ (■ n ai ) R 

wAk v ) Pi (n vi ) R 


(4) 


for each i, j £ Af and a, v £ £, where ( x)y represents the 
average of x according to the distribution Y{x). These 
constraints simply require that the ratio of the average 
number of elements in each list (left) equals the relative 
proportion in which pairs appear in hyperedges (right). 

As for the CM, this stub matching scheme uniformly 
samples—up to corrections of order 0(N ~ 1 )—a maxi¬ 


mally random ensemble of graphs defined by the distri¬ 
butions {iVi}i G j\f and {Pi(fc)} i&A f. fc6N |£|, and by the rules 
1Z. Since stubs are matched randomly, the graphs of 
that ensemble have an underlying tree-like structure in 
the limit N —> 00 except within clustered hyperedges. 


B. Probability generating functions 

To solve percolation on this general ensemble of ran¬ 
dom graphs, we adapt the well-known pgf approach 
[9, HO] to account for vertex and stub types. As men¬ 
tioned above, this approach assumes that the structure 
of the graphs is locally tree-like, an assumption that is not 
valid whenever an hyperedge contains a loop (e.g., a tri¬ 
angle). However, by solving the component size distribu¬ 
tion on each hyperedge beforehand, it is possible to con¬ 
sider that the hyperedge has an effective tree-like struc¬ 
ture: the probability that there is an effective edge from 
vertex A to vertex B is simply the probability that vertex 
B can be reached from vertex A either directly or through 
the other vertices in the hyperedge. Figures [^a)] [(b)] il¬ 
lustrate the idea behind the effective tree-like structure. 
This slight change of perspective allows the use of the pgf 
approach even though the tree-like structure assumption 
is not valid in the original graph ensemble. 

The effective tree-like structure of hyperedges is un¬ 
veiled with Eqs. §, where a vertex is now identified by 
the pair (a, i ) instead of by its vertex type solely. In other 











(a) Before projection 


(b) After projection 


(c) Representation of fu,i(x) 


FIG. 3. (Color online) |(a) (b) Effective tree-like structure of an hyperedge from the point of view of a vertex of type 2. There 
exists an effective edge between the initial vertex of type 2 and any vertices that are directly or indirectly reachable from it. 
The probability for an effective edge to exist corresponds to the probability that a direct or indirect path exists. |(c)| Schematic 
representation of the pgf f ll i(x ). Knowing that a vertex of type i has been reached from one of its stubs of type p [i.e., a pair 
(p, i)], this pgf generates the distribution of the number of vertices of each type in its neighborhood, as well as the type of 
stubs from which they have been reached. The types of the vertices and of the stubs are identified with the subscripts of the 
variables x = {x„i}. 


words, we keep track of the type of the vertices but also 
the type of the stubs through which they are involved in 
the hyperedge. As a result, bold variables like n and l 
now contain |£| x Af\ elements instead of the \AT\ ele¬ 
ments as in Sec. |TI The quantity Q a i(l\n\TZ) therefore 
corresponds to the probability that a pair (a, i) leads to l 
pairs—i.e., l U j pairs (v,j), for each is £ £ and j £ N —in 
an hyperedge containing n pairs. A dependency on the 
rules 1Z has been added in Q Q i(i|n; 1Z) to explicitly mark 
that the inner structure of the hyperedges (e.g., quenched 
or random nature, probabilities of existence of vertices or 
edges) is prescribed by these rules [32]. 

The pgf that generates the distribution of the number 
of pairs that can be reached from an initial pair {a, i) in 
an hyperedge containing n pairs is 

n 

E Q™(i\n-,iz) Yl x l ;r s ^ ( 5 ) 

l—S a oSi v(zl£ 

je M 

where the sum covers every possible instances of l such 
that 5^6 m i < liim < for every m £ N and p £ £ 
(“o” denotes the entrywise product). These two deltas 
of Kronecker account for the fact that there is at least 
one pair (a, *) in the hyperedge. Similarly, the two oth¬ 
ers deltas of Kronecker S av Sij appearing in Eq. © re¬ 
move the initial pair—by definition included in l from 
the count of reachable pairs. Because we are ultimately 
interested in the number of pairs that can be reached 
from a given initial pair regardless of the specifics of the 
hyperedge, we must remove the dependency of Eqs. ([5]) 
on the composition n. To do so, we average this pgf 
over the probabilities that the initial pair (a, i) belongs 
to an hyperedge whose composition is n. Since stubs are 
matched randomly, a vertex identified by the pair (a, i) is 
ten times more likely to belong to an hyperedge contain¬ 
ing ten pairs (a, i ) than to belong to an hyperedge that 
contains only one pair Consequently the proba¬ 

bilities R(n) must be weighted by the number of pairs 


( a,i ) that each composition contains, i.e., averaged over 
n a iR(n) / (n a i) r, where the normalizing factor, ( n a i)R , is 
the average value of n a i with respect to the distribution 
R(n) . Doing so yields the pgf generating the distribution 
of the number of pairs of each types that can be reached 
from a pair (a, i ) 

0aiO) = E n r jR \ W ' > Q a i(l\n;TZ) 

n \n ai /R l=SaOS . ve£ 

( 6 ) 

where the sum over n covers all hyperedge compositions 
such that R(n ) ^ 0. Computed for each initial pair (a, *), 
6 ai {x) provides the projection of the outcomes of perco¬ 
lation on the hyperedges unto an effective branching tree 
and therefore permits the use of the pgf approach. 

To solve percolation on the graphs defined in the pre¬ 
vious subsection, we first need to compute the distri¬ 
bution of the composition of the neighborhood of ver¬ 
tices. The neighborhood of a vertex is the set of reach¬ 
able vertices with which it shares an hyperedge. In other 
words, vertex B is a neighbor of vertex A if there ex¬ 
ists an effective edge from vertex A to vertex B. The 
pgf 9 ai (x) generates the distribution of neighbors that 
a vertex of type i has through one of its stubs of type 
a. In the limit N —> oo, the tree-like structure of the 
graphs ensures that the neighboring vertices reachable 
through two different stubs do not overlap. Hence, the 
composition of the neighborhood of a vertex of type i 
that has k a and kp stubs of type a and /3 is generated 

by [0 ai {x)] k °‘ ■ [dp i{x )\ kfl , which corresponds to the con¬ 
volution of the distributions. Since the number of stubs 
belonging to vertices of type i is distributed according 
to {T)(fe)} fcgN |£|, we obtain that the distribution of the 
composition of the neighborhood of vertices of type i is 
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generated by 

9i{x) =^2Pi(k) [e ai (x)] ka , (7) 

k a(zE 

where the sum covers all cases where Pi(k) ^ 0. As 
in 6 ai (x ), this pgf keeps track of the type of the 
stubs from which the neighboring vertices have been 
reached through the subscripts of the variables x = 
{xi/j}ve£-J&AT- I n other words, gi{x) generates the num¬ 
ber of pairs that are in the neighborhood of a vertex of 
type i. This pgf is analogous to the function Gq(x ) gen¬ 
erating the degree distribution in the CM BE]. 

The complete solution to the percolation problem re¬ 
quires the distribution of possible neighborhood compo¬ 
sitions for vertices reached through one of their stubs. As 
discussed for 9 ai {x), the probability for a stub of type /i 
to be attached to a vertex with a total of k stubs (i.e., 
k a stubs of type a for every a £ £) is weighted by the 
number of stubs of type /r that this vertex has. Hence, 
given that a vertex of type i has been reached through 
one of its stubs of type g [i.e., a pair (/q i)], the composi¬ 
tion of the neighborhood accessible from its other stubs 
is generated by 

Ui(x) = ]T n , ( 8 ) 

k ' v! p i ae£ 


where the delta S a ^ has been added to exclude from the 
count the stub of type g from which the vertex has been 
reached, and where (k lt )is the average number of stubs 
of type g that vertices of type i have. The distributions 
generated by f^i(x) are analogous to the excess degree 
dist ribut ion generated by G\(x) in the CM BE- Fig- 
^Fc) illustrates the information encoded in the pgfs 


ure 


Ui(x.)■ 


C. Extensive “giant” component 

Having defined the pgfs gi(x) and /^(x), the behav¬ 
ior of the extensive “giant” component can be predicted 
in the limit N —> oo using simple self-consistency argu¬ 
ments. We define a^i as the probability that a vertex of 
type i reached via one of its stubs of type /i does not lead 
to the giant component. Self-consistency then requires 
that if this pair does not lead to the giant component, 
then neither should the pairs that are reachable from it. 
Since the distribution of the number of pairs reachable 
from a given pair {g,i) is generated by Eq. this self- 
consistency requirement can be rewritten as 

= fp,i( a ) (9) 

for every g S £ and i S M. Because the coefficients 
of ffii[x) are normalized (they form a probability distri¬ 
bution), the point a = 1 (every a^i equals 1) is always 
a solution of Eqs. However, as the density of edges 
and/or vertices increases with increasing and/or 


{Pjk}j,keNi another solution where at least one element 
of a is smaller than 1 appears. This new solution marks 
the emergence of an extensive component. 

Because their coefficients are all positives, the pgfs 
f/j,i(x) are all convex and monotonic increasing in 
[0,1]I £ I X WI, Hence when a = 1 is the only solution of 
Eqs. & in [0, 1]I £ I X I A/ ’I, it is the stable fixed point of (with 
neN) 


a (n+ 1) = /(a< n) ) , 


( 10 ) 


for any initial condition a/°) in [0,1]I £ I X I A/ ’I ) and where 
the map f(x) consists of every x). This fixed point 
becomes unstable through a transcritical bifurcation as 
soon as another solution in [0,1] 1^1 x appears. The 
shape of f(x) in [0, l]l f l x l^l and the fact that /(1) = 1 
implies that this other solution is unique in the interval 
of interest, that it is a stable fixed point of Eq. (101, and 


that the transition is continuous. Analyzing the stabil¬ 
ity of f(x) around the fixed point a = 1 leads to the 
criterion for the emergence of the giant component 


det(J - I) = 0 , 


( 11 ) 


where J is the Jacobian matrix of fix) around x = 1, 
and I is the identity matrix. Put differently, an extensive 
component exists whenever the largest eigenvalue of J, 
Amax(J), is greater than one [ID] , 

Having solved Eqs. the probability that a vertex of 
type i leads to the giant component through at least one 
of its neighbors is given by P t = 1 — gi(a). Consequently, 
the probability that a randomly chosen vertex does lead 
to the giant component is 


,tv EjSAfJ") 


_ 1 riWigija) 


( 12 ) 


where r* is the probability that a vertex of type i exists. 

As shown in Sec. |TT[ hyperedges may include directed 
edges, or edges that are more likely to exist in one di¬ 
rection than the other [i.e., ptj pji in Eqs. pa|)]. This 
implies that while vertex B is in the neighborhood of 
vertex A, vertex A may not be in the neighborhood of 
vertex B. From such local asymmetries, a global asym¬ 
metry arises between the probability that a vertex leads 
to the giant component, V. and the relative size S of 
the giant component. In such case, the extensive com¬ 
ponent has a “bow-tie” structure B HI] meaning that 
the vertices involved in the extensive component belong 
to one of the three non-overlapping sets I m , I both and 
X out . The set I ln includes vertices that lead to the giant 
component but that cannot be reached from it; these ver¬ 
tices are somehow “hidden” behind directed edges. The 
set X out contains vertices that cannot lead to the giant 
component but that can be reached from it; they are 
positioned downstream of directed edges. The set I both 
contains vertices that lead to the giant component and 
that can be reached from it. From this, we conclude 
V = |I in [JX hoth \/N and S = \X hoth \JX out \/N. 







This perspective offers a direct and intuitive way to 
calculate S: it is the probability that a vertex does not 
lead to the extensive component when the direction of ev¬ 
ery edges is reversed. This edge reversal is fully encoded 
in Q a i(l\n: 1Z) computed with Eqs. ([3]) with incoming di¬ 
rected edges swapped into outgoing ones (and vice versa), 
and with edges that were more likely to exist in a given 
direction now more likely to exist in the opposite direc¬ 
tion (i.e., Pij becomes Pji). From these probabilities, we 
define the pgfs O a i(x), gi{x) and /^(a;) which are anal¬ 
ogous to the ones previously defined [Pj(fc) and R(n) 
remain unchanged]. Defining a^i as the probability that 
a vertex of type i reached by one of its stubs of type 
p does not lead to the giant component in the reversed 
graph ensemble, self-consistency now requires 




(13) 


for every p £ £ and i £ Af. As for Eqs. (|9]) , the so¬ 
lution of this set of equations correspond to the fixed 
point of the corresponding map and can therefore be ob¬ 
tained by successive iterations of any initial condition in 
[0,1] l £ l x l-' v I. The elements of the Jacobian matrix of both 
Eqs. ([9]) and (131 are the average number of pairs, say 
( a,j ), that are in the neighborhood of a pair, say {p,i), 
in their respective graph ensemble. Since both systems, 
Eqs. ([9]) and (13), correspond to different perspectives of 
the same graph ensemble, the two Jacobian matrices are 
linked by a similarity transformation, and therefore have 
the same eigenvalues. Hence the transcritical bifurcation 
occurs simultaneously in both systems. 

Having obtained a from Eqs. (13), the probability for a 


vertex of type i to be part of the giant component is S t = 
1 — gi(a), and the relative size of the giant component is 


i&AT 


riWjSj 

.V r J 


ieJV 


riWjgi{a) 

EjeAfW 


(14) 


obtain a pgf that generates the distribution of the num¬ 
ber of vertices of each type that will be eventually reached 
from a vertex of type i\ the reach of this new pgf is no 
longer limited to the immediate neighborhood. In fact 
this new pgf allows to investigate the composition and 
the sizes of the components that contain a finite number 
of vertices. Let this new pgf be denoted K(z). 

To compute K(z), we first consider the pgf A ai (x) that 
generates the distribution of the number of all pairs of 
each type that will eventually be reached (i.e., not lim¬ 
ited to the first neighbors) from a vertex of type i given 
that this vertex has been reached from one of its stubs 
of type a. In other words, this function generates the 
distribution of the number of the vertices that are even¬ 
tually reached from a pair (a,i). Note that A ai (x) is a 
function of x so that it keeps track of the type of the 
stubs from which each vertex has been reached. Besides 
yielding a tree-like structure, the stub matching scheme 
used to generate graphs implies that the pgfs {f^i{x)} 
are invariant under translations on the graphs in the 
limit N ^ 00 . In other words, while navigating on a 
graph from this ensemble, the number and the type of 
the vertices downstream from any given vertex does not 
depend on the types of the vertices (or the types of the 
stubs) previously encountered; navigating on graphs from 
this ensemble is a stationary Markov process (i.e., it only 
depends on the current position on the graph). Conse¬ 
quently, a vertex of type i reached from one of its stubs 
of type a and a pair (cv,i) present in its neighborhood 
should both lead to a finite tree whose size and composi¬ 
tion are identically distributed; this distribution is gen¬ 
erated by A a i(x). Considering every combination (a,*), 
this self-consistency requirement can be mathematically 
formulated as 


Aai(x') — X a if a i(^A{x^ , (15) 


Clearly, when all hyperedges are symmetric (i.e., pij = 
Pji for every i,j £ Af) there is no global asymmetry 
in the graph ensemble, and V = S. Also, whenever 
Eqs. @ ^ are used in the context of site percolation— 
where vertices exist or are activated with a given set of 
probabilities—the value of V and S is relative to the num¬ 
ber of vertices that exist. In other words, V is the prob¬ 
ability that an existing vertex leads to an extensive com¬ 
ponent, and S is the probability that an existing vertex 
is part of it. 


D. Small components 

Substituting x V j by Zj for every j £ Af and v £ E in 
Eq. 0 yields a pgf that generates the number of ver¬ 
tices of each type that are directly accessible from a ver¬ 
tex of type i (i.e., vertices that are in its neighborhood). 
In other words, the information concerning the types of 
stubs is lost. Using self-consistency arguments similar to 
the one used in the previous subsection, it is possible to 


where the extra x a i accounts for the vertex of type i 
that has been reached through one of its stubs of type 
a. Analogously to the set of probabilities {a a i}, the pgfs 
{A ai (x)} are the fixed point of (with n £ N) 

A( n+1 )(x) =xof(A^ n) (x)) (16) 


where “o” denotes the entry wise product, and where 
f(x) is the same map as in Eq. (10). It is in fact straight¬ 
forward to show that the extra x guarantees that the dis¬ 
tributions generated by A(x) can be obtained for com¬ 
ponents of n vertices or less in n+1 iterations of Eq. (16) 
from the initial condition A^fa) = 1 [i.e., A^{x) = 1 
for every i £ Af and a £ £}. 

Having obtained A(x) up to a sufficient size of com¬ 
ponents, n, the number of vertices of each type that can 
be reached in a finite component from a randomly cho¬ 
sen vertex of type i is generated by Ki{z) = Zigi{A{z)). 
The pgf generating the number of vertices of each type 
that are accessible in a small component from a randomly 
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chosen (existing) vertex is 


K{z) = © 

i&AT 


rjWjK^z) 

Ejga r r o w j 


E 

ieAT 


TiWiZigi{A(z)) 

>-,c .V 'VO 


(17) 


It is worth mentioning that the distributions generated 
by K(z) and {A a i(z)} are not normalized in the pres¬ 
ence of an extensive component as there is a non-zero 
probability that a pair (a, i ) leads to the giant compo¬ 
nent. In fact, comparing Eqs. ( flO] ) and (16) leads to the 
conclusion that A a i( 1) = a a i and that K (1) = 1 — V. 


Similarly, when reversing the direction of edges (directed 
edges now run from C stubs to B stubs), we obtain 

9ai(x) = (1-npn)+ rxpiiXAi (19a) 

9 B i(x) = 1 (19b) 

9ci{x) = (1 - npn) + npuXBi , (19c) 


which yield the pgfs ^i(as), / ai ( x ) and f B i(x) [fci(x) 
is non-defined]. Using Eqs. ((18]) and (fT9() in Eqs. ©- 
© with r± = 1 yields the results obtained in Ref. [201 . 
and to the ones obtained for purely directed [5] or purely 
undirected random graphs m in the appropriate limits. 


IV. SPECIAL CASES AND APPLICATIONS 

To demonstrate the versatility and the flexibility of the 
formalism, we present a series of representative examples. 
This will also clarify the conceptual and numerical steps 
necessary to implement such a general approach. 


A. Semi-directed random graphs 

Semi-directed random graphs are composed of indistin¬ 
guishable vertices connected via undirected and directed 
edges. They were used in Ref. (2D] to study the impact 
of non-reciprocal connections in contact networks on the 
propagation of an emerging infectious disease. These 
non-reciprocal connections accounted for the suscepti¬ 
bility of health-care workers to get infected from infec¬ 
tious individuals seeking treatments in hospitals. Semi- 
directed are also a good first example for they have the 
well-known undirected graphs and directed graphs as spe¬ 
cial cases. 

Every vertices in these graphs belong to the same type 
(|AA| = 1, type 1, w\=l), and there are |£| = 3 types 
of stubs: stubs of type A are paired together to form 
undirected edges, and stubs of type B (outgoing) and 
C (incoming) are paired to form a directed edge. The 
joint degree distribution P\(k) = I\ [k,A,kn,kc) cor¬ 
responds to the distribution of undirected degree, out- 
degree and m-degree. In this scenario, the conditions 
given by Eq. Q imply that there must be as much incom¬ 
ing stubs as there are outgoing stubs, (ks)p 1 = ( kc)p 1 -, 
and they fix the values of R{n) = R(nAi,riBi,nci) in 
terms of the average degrees, i?(2, 0,0) = 1 — R( 0,1,1) = 
{ kA)p 1 /((kA)p 1 + 2(fcs)p 1 ). Assuming that edges exist 
with probability pn and vertices exist with probability 
r i, we find from Eqs. ([3]) and © 

9 A i{x) = (1 - ripn) + npnXAi (18a) 

9 B i{x) = (1 - ripn) + ripnxci (18b) 

9ci{x) = 1 , (18c) 

from which we define the pgfs gi(x ), /ai(®) and fci( x ) 
from Eqs. 0 and ©. Note that fsiix) does not ex¬ 
ist as vertices cannot be reached by an outgoing stub. 


B. Correlated random graphs 

Other interesting special cases of our model are cor¬ 
related random graphs: graphs where vertices are more 
likely to be connected with vertices having specific in¬ 
trinsic properties (e.g., degree, centrality, ethnicity, age 
group, gender). In such cases, there are |7V| types of 
vertices, one for each intrinsic property, and there are as 
many types of stubs: each type of stubs corresponds to 
the type of the vertex that is at the other end of the edge. 
To simplify the notation, types of stubs will be identified 
by the type of the vertex toward which they point (i.e., 
£ = Af). Hence the joint degree distribution Pi(k) pre¬ 
scribes the number of vertices of each type that vertices 
of type i are connected to. The conditions Q ask that 
there are as many stubs stemming from vertices of type 
i toward vertices of type j as in the reverse direction, 
Wi(kj)p i = Wj(ki)p j . These constraints also prescribe 
the distribution 

R(n) = ^7 E “ 5 o,n ji )m(k j )p i , (20) 

i,je AT 

where R' = JV, . /&/ y wp (kj>)p , is simply the normaliza¬ 
tion factor. Assuming that vertices of type i exist with 
probability r,, , and that edges going from a vertex of type 
* to a vertex of type j exist with probability py (i.e., edges 
may be more likely to exist in one direction that in the 
other), we get from Eqs. 0 and © 

9ji{x) — (1 vjpij ) -}- rjPijXij (21a) 

9ji(x) = (1 — fjPji ) + r jPji x ij (21b) 

for i,j £ Af. Using Eqs. © in Eqs. © © and setting 
every r, = 1 yields the results obtained in Ref. EH for 
multitype graphs, which are themselves a generalization 
of several other formalisms B 1221 UK- We have also used 
this approach in Ref. [5] to study the observability of 
random graphs, and in Ref. [16] to define an ensemble of 
graphs with an arbitrary k-core structure. 

C. Degree-correlated random graphs 

An important category of correlations is the one based 
on the degree of vertices [Ml These correlations 
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are encoded in the conditional probability P(d'\d) corre¬ 
sponding to the probability that the neighbor of a vertex 
with a degree d has a degree equal to d'. This can be re¬ 
produced with our formalism by considering that every 
vertex with the same degree are of the same type (i.e., a 
vertex of type i has i neighbors, and consequently {wi} 
corresponds to the degree distribution), and by using the 
following type-specific joint degree distribution 


Pi(k) 


FL'e/V ^i'¬ 


ll • 

je AT 


From Eq. Q, we obtain 


9i{x) 


P 01*) 6 'ii( a; ) > 

ieU 


( 22 ) 


(23) 


where 0 r ; (x) is given by Eq. (21a), and Eq. Q yields 


fn( x ) 


P U\i)8ji( x ) 

i&M 


(24) 


which is independent of the type of the vertex/stub, 
namely l, from which the vertex has been reached. This 
is a direct consequence of the multinomial distribution in 
Eq. (22) and shows that our approach, through the joint 
distribution Pi(k ), can include more detailed correlations 
in the degree of the neighbors of vertices. 

It may be useful at this point to illustrate the precise 
connection with previous works. Consider the quantity 
u i = Eje M P U N)[(1 ~ TjPjj) + r-j Pjjdj j] which under suc¬ 
cessive application of Eqs. <(2TbJ) and ( f2lj ) becomes 
the self-consistent expression 


Ui = ^2 p (iK) [l 1 - r jPji) + r jPji u j ^ (25) 

jeM 

for every i £ Af. Setting every TjPji = 1 — /, with 
0 < / < 1, in this last equation yields Eqs. (5) and (13) of 
Ref- BD3 , while Eq. (8) of Ref. [22] is obtained by setting 
TjPji = 1. Similarly, replacing P(j\i) by jwj/ lw l 

and assuming every Ui = u in Eq. ( [25| yields the results 
of Ref. [8]. More precisely, setting every p :ji = 1 allows 
to retrieve their Eq. (15), and their Eq. (8) is obtained 
by setting every VjPji = q s qb , with 0 < q s ,qb < 1- Ex¬ 
pressions for the size of the extensive component derived 
in Refs. [HI 22, 3D] can be obtained from our formalism 
similarly. 


D. Clustered random graphs 

We now show how many variants of the CM containing 
clustered hyperedges (i.e., hyperedges that contain loops) 
are special cases of the approach presented in this paper. 

Since the clustering property is related to the num¬ 
ber of triangles found in graphs—hence capturing the 


idea that the friend of my friend is also my friend — 
it is natural to introduce clustering in graphs through 
the use of triangles (i.e., three vertices all connected 
together) [24] [25] [271 . The simplest clustered graph 
ensemble then has \AF\ = 1 types of vertices (type 1, 
W\ = 1), and |£| = 2 types of stubs: two stubs of type 
A are paired to form undirected edges and three stubs 
of type B are matched to create triangles. Note that 
only one stub of type B is required to belong to a trian¬ 
gle even though its contribution amounts to two to the 
degree of the vertex; stubs can be seen as a member¬ 
ship to an hyperedge. The constraints given by Eq. Q 
de facto set the values of R(n) = R{n A i,riBi) since 

R( 2,0) = 1 - 22(0,3) = 3(k A )p 1 /(3(k A }p 1 + 2 (k B ) Pl ). 

Assuming that vertices and edges exist with probabili¬ 
ties r\ and pn, we obtain from Eqs. (J3| and § 

0 A i{x) = (1 - ripn) + ripnx A i (26a) 

9 B i{x) = (1 - ripn) 2 + 2npu[l - ripu(2 - pn)\x B i 
+ rlp 2 u[i ~ 2pn]x 2 B1 . (26b) 

Using these two functions in Eqs. ([7]) |l7| ) leads di¬ 

rectly to the results obtained in Ref. [211 1251128j . Simi¬ 
larly, the results of Ref. m can be obtained with three 
types of vertices, A f = {1,2,3}, and one type of stubs, 
£ = {A}, where all hyperedges are triangles containing 
one vertex of each type [22(1,1,1) = 1 and d Ai (x) = 

X A lX A 2X A 3 / . 

Besides triangles, clustering—or any digression from a 
perfect tree-like structure—has been introduced in ran¬ 
dom graphs through the inclusion of various categories 
of hyperedges that involve more than three vertices. 
For instance, in Ref. m\mm clustering is incorpo¬ 
rated through fully connected hyperedges, or cliques , 
where vertices or edges exist with given probabilities (i.e., 
Erdos-Renyi graphs). In all cases, there is only one type 
of vertices. We retrieve the model of Ref. [24] by us¬ 
ing one type of stubs; P\{k A ) prescribes the number of 
cliques to which vertices belong, and R{n A i) prescribes 
the size of cliques (respectively the distributions r m and 
s n in Ref. [24]). In the model considered in Refs. mug, 
vertices belong to only one clique, but can have many sin¬ 
gle edges. Since the number of single edges and the size of 
the clique can be correlated in the original model, there 
is one type of stubs for each clique size and an additional 
type for single edges; cliques of size m are formed by 
matching m stubs of the type assigned to cliques of size 
to. Hence the structure of the graphs is fully prescribed 
by P\{k) whose argument indicates the number of single 
edges and the size of the clique. The constraints Q then 
yield 


, (27) 

R n V 

where R" = normalization con¬ 

stant. Using these distributions and quantities, our 
model reproduces the ones presented in Refs. (T41 ITS] . 
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FIG. 4. (Color online) Comparison of the emergence of the 
giant component in a clustered graph ensemble that qualifies 
for the strong clustering regime with its unclustered counter¬ 
part. We see that the latter has a larger giant component and 
a lower percolation threshold. Details on the graphs used are 
given in Sec. |IV E| 


Also, we have used a version of our model that is similar 
to the one introduced in Ref. m to uncover a transition 
in the effectiveness of immunization strategies j5]. 

Finally, two of the most versatile models published to 
date are also special cases of our model. Reference m is 
a previous version of the model presented in this paper. 
The two main differences are that the previous version 
did not handle site percolation, and that only stubs of 
same type could be matched to create hyperedges (e.g., 
forbidding directed edge between vertices of a same type). 
The model introduced in Ref. HU can be retrieved from 
our model with one type of vertices (|A/”| = 1) and with 
one type of stubs for each role that a vertex can play in 
hyperedges. However, this approach lacks the systematic 
method offered by Eqs. (|3j) to solve percolation on each 
hyperedge beforehand, thereby limiting the number of 
hyperedges that can effectively be handled analytically. 


E. Weak and strong clustering regimes 

We now use our model to test a conjecture regarding 
the effect of clustering (e.g., triangles) on bond perco¬ 
lation. References EH EH proposed that clustering has 
opposite effects on the bond percolation threshold and 
on the size of giant component depending of the den¬ 
sity of triangles in a graph. This density is measured 
through the degree dependent clustering coefficient c(k): 
the probability that two neighbors of a vertex of degree 
k are also neighbors (i.e., they complete the triangle). 
The conjecture states that the weak clustering regime 
c{k) < (1 — fc) -1 leads to a higher percolation threshold 
and to a smaller giant component than in an equivalent 
unclustered graph. Contrariwise, strong clustering, de¬ 
fined as c(k) > (1 — A:) -1 , leads to a lower percolation 


threshold and to a larger giant component than in an 
equivalent unclustered graph. 

Let us consider the following graph ensemble in which 
there are two types of vertices J\f = {1,2} and three types 
of edges £ = {A, B,C}. Every vertex of type 1 has one 
stub of type A and one stub of type B, while each ver¬ 
tex of type 2 has one stub of type B and one stub of 
type C. In other words, we set Pj(fc) = Pi(kA,ks,kc ) 
as Pi(l,l,0) = P2(0, 1,1) = 1.0. Hyperedges are 
formed by matching either 4 stubs of type A, 4 stubs 
of type B (two stemming from vertices of type 1 and two 
from vertices of type 2), or 8 stubs of type C; vertices 
are all connected to one another in every hyperedges. 
The constraints given by Eq. Q imply that w\ = W 2 
and that R{n) = R(nAi,riBi,nci,nA 2 ,nB 2 ,nc 2 ) fol¬ 
lows the relation 2P(4,0, 0,0,0,0) = P(0,2, 0,0, 2, 0) = 
4i?(0,0, 0, 0,0,8) = 4/7. Vertices of type 1 all have a de¬ 
gree equal to 6, and vertices of type 2 all have a degree 
equal to 10. Consequently, we see that c(6) — > \ and 

c(10) = || > jj, which implies that this graph ensemble 
qualifies for the strong regime. 

To isolate the effect of clustering on bond percola¬ 
tion, we compare the results obtained for the graph 
ensemble described above with the ones obtained with 
an equivalent unclustered version m- This equivalent 
graph ensemble possesses identical correlations, but hy¬ 
peredges are broken into individual independent edges 
instead (e.g., each vertex in an hyperedge containing n 
vertices now have n — 1 independent edges). The behav¬ 
ior of its giant component is obtained as in Sec. IV B with 
P 1 (4,2,0) = P 2 (0,2,8) = 1. 


Figure [4] compares the behavior of the giant component 
in both ensembles when edges exist with probability pn. 
We conclude that although the clustered graph ensemble 
qualifies for the strong regime, the behavior observed is 
the one of the weak regime: higher percolation thresh¬ 
old and smaller giant component than for the equivalent 
unclustered graph. This behavior can be understood in 
terms of branching factors. The unclustered graphs have 
a tree-like structure and therefore maximize the number 
of new vertices encountered while navigating the graph: 
every edge leads to a new vertex. The redundancy caused 
by clustering means that not all edges lead to a new ver¬ 
tex in the clustered graphs, which reduces the average 
number of vertices that can be reached from any given 
vertex. Hence a larger number of edges must be present 
for a giant component to appear (e.g., larger threshold), 
and this component will be smaller as many edges will 
be wasted by leading to vertices previously reached. 

This counterexample suggests that the criterion on 
c(k) could be a necessary condition for a strong clustering 
regime but that it is not a sufficient one. The explana¬ 
tion in terms of branching factors alongside the results 
in Refs. mmm point toward the conclusion that the 
effect of clustering on random graphs with an underlying 
tree-like structure is best described by the weak cluster¬ 
ing regime. Indeed, Ref. [13 has recently shown that 
strong clustering may induce a double continuous phase 
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transition which conciliates the conjectured antagonistic 
effects of weak and strong clustering, and whose effects 
are in line with our conclusion. 


F. Bijection between site and bond percolation 
thresholds 


From Eqs. (8|, we see that the elements of the Jacobian 
matrix J used to determine the point at which the giant 
component appears have the general form 


d/ M i( 1 ) _ (Mfcq ~ <W)Pi dd ia (l) 

^ x vj a&£ (k^Pi dx V j 


(28) 


for every i, j £ J\f and /i, v £ £. These terms are in fact 
branching factors: each element is the average number 
of pairs {u, j) that are present in the neighborhood of a 
pair (/x,z). More precisely, the first term corresponds to 
the average number of stubs of type a that a vertex of 
type i has if it has been reached from one of its stubs of 
type p (this stub is excluded from the count if a = p). 
The second term is the average number of pairs (u, j) 
that can be reached in hyperedges accessed via a stub of 
type a of a vertex of type i. The value of these latter 
terms depends on the structure of hyperedges (i.e., rules 
7 Z) and on the probabilities for vertices and edges to exist 
(i.e., {rj} je M and {; Vjk}j,keM )■ 

Let us assume that all hyperedges have the same struc¬ 
ture; vertices of different types may be involved in a non¬ 
trivial manner as long as all hyperedges have the same 
shape (e.g., they all are triangles). We also suppose that 
vertices and edges exist with probabilities that are inde¬ 
pendent of their type, that is = r and pij = p for all 
i,j £ AT. In such case, every nonzero elements of the Ja¬ 
cobian matrix, ^ , is a polynomial in r and p , h(r,p ), 
and is independent of i, j, a and v. Consequently the de¬ 
pendency in r and p can be factored out of the Jacobian 
matrix 


J = h(r,p) J' . 


(29) 


Since the giant component appears when A max (J) = 
h(r,p) A max (J') = 1, the points (r',p') at which the phase 
transition occurs all belong to the critical surface 


h(r',p') = 


1 


Amax(J0 


(30) 


Whenever the Jacobian matrix can be written like in 


Eq. (291, any given point (ri,pi) at which a graph en¬ 


semble is known to percolate can be related to any other 
critical point ( 7 - 2 ,£> 2 ) through h(ri,pi) = h(r 2 ,P 2 )- For 
instance, this relation leads to a direct bijection between 
the thresholds of pure site percolation (r c , 1) and pure 
bond percolation (1 ,p c ) through h(r c , 1) = h(l,p c ). Ad¬ 
ditionally, h{r,p) = rp for unclustered correlated ran¬ 
dom graphs, and the fact that r and p only appear as rp 
in Eqs. (211 implies that site and bond percolation are 


equivalent for this random graph ensemble. 


V. INTERDEPENDENT RANDOM GRAPHS 

In this last section, we briefly show how our approach 
can be adapted to model interdependent graphs through 
a redefinition of Eqs. and use the resulting for¬ 

malism to investigate the emergence of an extensive com¬ 
ponent on interdependent clustered random graphs. 

To lighten the description, we consider the case of two 
interdependent graphs, graph A and graph B, without 
loss of generality (guidelines for a straightforward gen¬ 
eralization to an arbitrary number of graphs are given 
in Ref. J2])- We assume that every edge in each graph 
is undirected such that there is no global asymmetry: 
the probability that a randomly chosen vertex leads to 
the extensive component is equal to the relative size of 
the extensive component (i.e., V = S ). Furthermore, 
we consider that the pgfs gt(x) and and all other 

related quantities defined in the previous sections (i.e., 
{wi}, 1Z , A f, {ri}, ...) are known for both graphs and 
are identified with the superscript A or B. Both graphs 
contain the same number of vertices which tends to in¬ 
finity: N A = N b = N —> 00 . 

The change in the nature of the transition (i.e., from 
continuous to discontinuous) originates from the exis¬ 
tence of dependencies between vertices of the two graphs. 
Again, to lighten the description, we consider the case in 
which each vertex has either one twin vertex on which it 
depends, or none. To specify the dependencies between 
vertices, we define q AB as the probability that a vertex of 
type i in graph A has a twin vertex of type v in graph B. 
Note that allowing graphs to be partially dependent—not 
all vertices have a twin vertex—implies that 

E q AB = 1 - q AB < 1 , (31) 

veM B 

for each i £ J\f A , and where the sum is over the types of 
the vertices in graph B. Therefore, a fraction q AB of the 
vertices of type i in graph A do not have a twin vertex in 
graph B. A similar set of probabilities, {qf^}j,u with j £ 
J\f B and u £ N A , is defined to specify the dependencies of 
vertices in graph B. Moreover, we add the constraint that 
the dependency between two vertices must be reciprocal 
unless a vertex’s twin has no dependency whatsoever. In 
other words, if vertex n A in graph A depends on vertex 
n B in graph B, then either vertex n B depends on vertex 
n A as well or it depends on no vertex at all. In the latter 
case, vertex n A is the only vertex in graph A that can 
depend on vertex n B . This constrains the two probability 
sets, {q AB }i, v and {qf A }j,u, as there must be enough 
“independent” vertices in graph A (graph B) to account 
for the vertices in graph B (graph A) whose dependency 
is not reciprocal. Mathematically, these conditions can 
be written as 

max { NA wfqfv B - N B w B q BA , 0 } < N B w B q BA 

ieM A 


(32) 
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for each u £ J\f A and v £ J\f B . A similar expression in 
which the superscripts A and B are swapped must also 
hold. In the expression above, N A w A q AB corresponds to 
the number of vertices of type i in graph A that depend 
on a vertex of type v in graph B, and N B w B q BA is the 
number of vertices of type v in graph B that have no 
dependency. 

A discontinuous phase transition is associated with the 
emergence of an extensive functional component: an ex¬ 
tensive component composed solely of vertices with no 
dependency or whose twin vertex is part of the exten¬ 
sive functional component in its respective graph. To 
compute the size of the extensive functional component, 
we define a A t as the probability that a vertex of type i 
reached from one of its stubs of type g in graph A does 
not lead to the functional extensive component in graph 

A. Similar probabilities are defined for the other types of 
vertices and stubs in graph A and in graph B. Following 
the locally tree-like structure argument of Sec. |III C'[ we 
now derive a set of self-consistent equations similar to 
Eqs. ([9]) for these probabilities. 

Let us consider the case of a vertex of type i in graph A 
reached via one of its stubs of type g. By definition, this 
vertex belongs to the extensive functional component in 
graph A with probability 1 — a^. Only two scenarios 
can lead to this situation. The first one consists in the 
vertex being part of the extensive functional component 
in graph A and having no twin vertex. This happens 
with probability [1 — f^i(a A )]q AB . The second scenario 
consists in the vertex being part of the extensive func¬ 
tional component in graph A and having a twin vertex 
that is part of the extensive functional component in its 
own graph. This scenario happens with probability 

1 -f£i( aA )\ 55 ^t Bir v[ l -9v( aB )\ » (33) 

v€jV B 

where q AB is the probability that the twin vertex is of 
type v, r B is the probability that it exists (i.e., it has not 
been removed), and 1 — g B (a B ) is the probability that it 
belongs to the extensive functional component in graph 

B. Summing these two scenarios yields 


[11 


= 1 - 



+ 51 -g B (a B )\ 

vG1T b 


(34) 


which must hold for every i £ M A and g £ S A , as well as 
for graph B (i.e., simply swap the superscripts A and B 
in the last expression). Having solved Eqs. (34) for the 
probabilities a A = {a A t } and a B = {a® }, the probabil¬ 
ity that a randomly chosen vertex of type i in graph A 
belongs to the functional component is 


S A 


1 - af{a A )] [qf B 

+ 55 <l?v BrB [ l - 9 B {a B )) 

veJV B 


(35) 


which is similar to the calculation of Si in Sec. |III C'[ but 
in this case the probability that a vertex of type i belongs 
to the functional extensive component, 1 — g A (a A ), is 
weighted by the probability that its twin vertex, if any, 
belongs to the extensive functional component as well. 
Averaging over the fraction of existing vertices of each 
type (e.g., a fraction r A w A of vertices in graph A corre¬ 
sponds to vertices of type i that have not been removed), 
we finally obtain the size of the extensive functional com¬ 
ponent in graph A 


S A = E y 

ifztfA z^jeif A ■ j “j 


r A in A 

^i ^i gA 


A A 4 


(36) 


Similar equations for vertices in graph B are obtained 
by swapping the superscripts A and B in the last two 
expressions. As for the quantities V and S defined 
previously, the fraction S A (and S B ) is relative to the 
number of existing vertices (i.e., vertices that have not 
been removed). Note also that Eqs. ([9]) are retrieved 
from Eqs. (34) if there are no dependencies. How¬ 
ever, contrariwise to Eqs. ([9]) and Eqs. (13), the right- 
hand side of Eqs. ( f34| ) does not necessarily correspond 
to monotonously increasing functions (some coefficients 
in the polynomials are negative). This implies that, al¬ 
though the point a A ® a B = 1 is still a solution, another 
solution in the hypercube [0,1] l- v l x l £ l x l £ I corre¬ 

sponding to the presence of an extensive functional com¬ 
ponent may not appear continuously from a A © a B = 1 
through a transcritical bifurcation as in Sec. |III C| Hence 
the values of S A and S B jump abruptly from zero to a 
finite value in [0,1] which corresponds to a discontinuous 
phase transition. 

To illustrate this behavior, we investigate the emer¬ 
gence of extensive components on interdependent clus¬ 
tered random graphs. To do so, we consider the edge- 
triangle clustered graph ensemble presented in Sec. |IVD| 
with the joint degree distribution Pi (0,3) = Pi (2,1) = 
2Pi(2, 0) = 4/10 and pn = 1.0. Notice that this joint de¬ 
gree distribution forces assortative mixing since high and 
low degree vertices tend to be segregated. The size of 
the extensive component in this isolated random graph 
ensemble is given as a function of the vertex existence 
probability r\ in Fig. [ 5 ] (black curve labeled <S( Cji )). 

To illustrate the impact of interdependence on the 
phase transition, we consider the case of two identical 
partially dependent edge-triangle clustered graphs with 
q AB = 0.6 and q BA = 1.0. In other words, only 60% of 
the vertices in graph A depend on a vertex in graph B 
whereas every vertex in graph B has a twin vertex. The 
green curves labeled S A ^ and S B d j in Fig. |5| show the 
size of the extensive functional component in graph A and 
graph B as a function of rq. We see that the extensive 
functional components indeed emerge through a discon¬ 
tinuous phase transition, unlike the extensive component 
in the isolated clustered graphs that emerges continu¬ 
ously. We also see that the extensive functional compo¬ 
nent in graph A is always bigger than the one on graph 
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FIG. 5. (Color online) Comparison of the size of the extensive 
(functional) components as a function of the vertex existence 
probability r i in four related graph ensembles. The details 
of each graph ensemble are given in the main text of Sec. [V] 
The curves iS( Ci ;) and 5(„,i) were obtained with Eq. (141 and 
the curves Sf c<d) , S B C}d) , Sf u , d) and Sf uA) were obtained with 
Eq. ( |36| ). Symbols show the results of numerical simulations 
on these graph ensembles with N = 10 6 vertices, (inset) Com¬ 
parison between the curve S/J, d ^ and the curve S( U} i) rescaled 
according to = q\ B S( u ^ [r\qf B ] (the dependence to 

ri in shown in brackets). 


B since vertices with no dependency are more likely to 
be in the extensive functional component than vertices 
with a dependency. Indeed, a vertex in graph A that 
has no dependency belongs to the extensive functional 
component with probability [1— gi(a A )\ which is clearly 
greater than [1 — g A (a A )\rf [1 — gf{a B )\ for a vertex 
with a dependency since 0 < rf [1 — g B (a B )\ < 1 (the 
second equality holds only when there is no extensive 
component and r B = 1). Figure [H] also shows that the 
size of the extensive functional component in the inter¬ 
dependent graphs is bounded by the size of the giant 
component in the corresponding isolated graphs. Again, 
this is expected since being part of the extensive com¬ 
ponent is a sine qua non condition for being part of the 
extensive functional component. 


As in Sec. ITveI we consider the unclustered version 
of the edge-triangle random graph ensemble—in which 
triangles are broken into two independent single edges— 
to isolate the impact of clustering. In order to preserve 
the correlations present in the clustered dependent graph 
ensemble, two types of stubs are used to distinguish the 
original single edges from the single edges due to the 
broken triangles. As expected, the extensive component 
in this isolated unclustered graph ensemble appears at 
a lower value of rq (red curve labeled S in Fig. [5l) 
than for the isolated clustered random graph ensemble 
(see Sec. IVE). The same conclusion holds for the ex¬ 
tensive functional component in the unclustered version 
of the two partially dependent graphs described in the 
last paragraph (blue curves labeled d ^ and S B d ^ in 


Fig-§. Comparing the size of the extensive functional 
component in the clustered and unclustered versions of 
these interdependent graphs also suggests that clustering 
increases the jump size at the transition. 

One interesting observation from Fig. [5]is that graph A 
in the interdependent unclustered graph ensemble (curve 
labeled S A d) ) successively undergoes two phase tran¬ 
sitions: a continuous then a discontinuous one. While 
the discontinuous transition is caused by the interdepen¬ 
dency with graph B, the continuous one is due to the fact 
that the 40% of vertices in graph A that have no depen¬ 
dency are able to form an extensive component before 
the discontinuous phase transition occurs. From Fig. [5j 
we see that the continuous phase transition happens at 
rq ~ 0.23 in the isolated unclustered graph ensemble, 
while the phase transitions occur at rq ~ 0.58 (contin¬ 
uous) and at rq ~ 0.68 (discontinuous) in the interde¬ 
pendent unclustered graph ensemble. Below rq ~ 0.68, 
the vertices in graph A that depend on a vertex in graph 
B can be effectively considered as removed since there is 
no extensive functional component in graph B (i.e., they 
cannot be part of an extensive component). Hence we 
expect graph A to behave as its isolated graph ensemble 
counterpart in which an effective fraction l — r\q AB of its 
vertices have been removed. This is indeed confirmed in 
the inset of Fig. [5] In fact, successive phase transitions 
should occur whenever independent vertices are able to 
form an extensive component before the extensive func¬ 
tional component emerges. In other words, we observe 
a double phase transition whenever the rescaled value 
at which the continuous phase transition happens in the 
isolated graph ensemble is below the value at which the 
discontinuous phase transition occurs in the interdepen¬ 
dent graph ensemble. 


VI. CONCLUSION 

Building upon our previous works m noi nn ns eh , we 
have presented a unifying conceptual framework that of¬ 
fers a comprehensive mathematical description of a wide 
variety of structural properties found in graphs extracted 
from real complex systems (e.g., correlations, segrega¬ 
tion, clustering of various forms). The generality of the 
formalism resides on a multitype perspective for a pre¬ 
cise prescription on how vertices are connected to one 
another, and on a set of iterative equations for the solu¬ 
tion of the distribution of the size of components in small 
arbitrary graphs. Interestingly, these iterative equations 
are by themselves a valuable addition to graph theoreti¬ 
cal methodology. In fact, besides being a cornerstone of 
our formalism, allowing a mapping of hyperedges unto 
an effective tree-like structure, they also have potential 
applications in the theoretical description of fragmenta¬ 
tion processes and of percolation on lattices I47H5D] (see 
Ref. [36] for further details). 

Our approach leads to the definition of a very gen¬ 
eral random graph ensemble for which site and/or bond 
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percolation can be solved exactly using probability gener¬ 
ating functions in the infinite size limit (e.g., size of the 
giant component, percolation threshold, distribution of 
the size of small components). We have shown that this 
random graph ensemble encompasses most random graph 
models published until now and can incorporate struc¬ 
tural properties not yet included in a theoretical frame¬ 
work. This versatility makes it a perfect theoretical lab¬ 
oratory to investigate the role of specific local structural 
properties on the global connectivity of the graphs. We 
have illustrated this point by implementing our method 
to provide a counterexample to a conjecture m on the 
effect of clustering on the size of the giant component 
and on the percolation threshold. 

Our formalism is also naturally equipped for the mod¬ 
eling of interdependent graphs whose most striking fea¬ 
ture is the emergence of the extensive component via 
a discontinuous phase transition. We have provided a 
specific implementation for this application that demon¬ 
strates how a graph can successively undergo a continu¬ 
ous then a discontinuous phase transition, and how clus¬ 
tering increases the amplitude of discontinuity at the 
transition. 


By offering one of the most comprehensive mathemat¬ 
ical description of percolation on random graphs, we 
believe that the present work will contribute to a bet¬ 
ter understanding of the interplay between local struc¬ 
tural properties and the global connectivity of graphs. 
Moreover, our approach can easily accommodate other 
types of dynamics for which the pgf technique has al¬ 
ready proven to be useful H [53 G2]. We are hopeful 
that several extensions (different dynamics and/or per¬ 
colation models) will shed further light on the role of 
structure in the behavior of complex systems. We put 
forward that some of the tools to perform these studies 
are now available. 
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